52 research outputs found

    Making an image worth a thousand visual words

    Get PDF
    The automatic dissimilarity analysis between images depends heavily on the use of descriptors to characterize the images’ content in compact and discriminative features. This work investigates the use of visual dictionaries to represent and retrieve the local image features using the popular Bag-of-Visual-Words modeling approach. We evaluated the impact of different parameters in the construction of this modeling approach, showing that an image can be effectively described using less than a thousand words.FAPESPCAPESSticAMSUDProjeto RESCUER, financiado pela Comissão Européia (Grant 614154) e pelo Conselho Nacional de Desenvolvimento Científico e Tecnológico CNPq/MCT

    Preface

    Get PDF
    7th International Conference on Similarity Search and Applications (SISAP).\ud Los Cabos, México. 29-31 october 2014

    Quantitative temporal association rule mining by genetic algorithm

    Get PDF
    Association rule mining has shown great potential to extract knowledge from multidimensional data sets. However, existing methods in the literature are not effectively applicable to quantitative temporal data. This article extends the concepts of association rule mining from the literature. Based on the extended concepts is presented a method to mine rules from multidimensional temporal quantitative data sets using genetic algorithm, called GTARGA, in reference to Quantitative Temporal Association Rule Mining by Genetic Algorithm. Experiments with QTARGA in four real data sets show that it allows to mine several high-confidence rules in a single execution of the method

    Improving metric access methods with bucket files

    Get PDF
    Modern applications deal with complex data, where retrieval by similarity plays an important role in most of them. Complex data whose primary comparison mechanisms are similarity predicates are usually immersed in metric spaces. Metric Access Methods (MAMs) exploit the metric space properties to divide the metric space into regions and conquer efficiency on the processing of similarity queries, like range and k-nearest neighbor queries. \ud Existing MAM use homogeneous data structures to improve query execution, pursuing the same techniques employed by traditional methods developed to retrieve scalar and multidimensional data. In this paper, we combine hashing and hierarchical ball partitioning approaches to achieve a hybrid index that is tuned to improve similarity queries targeting complex data sets, with search algorithms that reduce total execution time by aggressively reducing the number of distance calculations. We applied our technique in the Slim-tree and performed experiments over real data sets showing that the proposed technique is able to reduce the execution time of both range and k-nearest queries to at least half of the Slim-tree. Moreover, this technique is general to be applied over many existing MAM.CAPESCNPqFAPESPInternational Conference on Similarity Search and Applications - SISAP (8. 2015 Glasgow

    RSC: mining and modeling temporal activity in social media

    Get PDF
    Can we identify patterns of temporal activities caused by human communications in social media? Is it possible to model these patterns and tell if a user is a human or a bot based only on the timing of their postings? Social media services allow users to make postings, generating large datasets of human activity time-stamps. In this paper we analyze time-stamp data from social media services and find that the distribution of postings inter-arrival times (IAT) is characterized by four patterns: (i) positive correlation between consecutive IATs, (ii) heavy tails, (iii) periodic spikes and (iv) bimodal distribution. Based on our findings, we propose Rest-Sleep-and-\ud Comment (RSC), a generative model that is able to match all four discovered patterns. We demonstrate the utility of RSC by showing that it can accurately fit real time-stamp data from Reddit and Twitter. We also show that RSC can be used to spot outliers and detect users with non-human behavior, such as bots. We validate RSC using real data consisting of over 35 million postings from Twitter and Reddit. RSC consistently provides a better fit to real data and clearly outperform existing models for human dynamics. RSC was also able to detect bots with a precision higher than 94%.FAPESPCNPqCAPESSTIC-AmSudRESCUER project funded by the European Commission (Grant: 614154) and by the CNPq/MCTI (Grant: 490084/2013-3)JSPS KAKENHI, Grant-in-Aid for JSPS Fellows #242322National Science Foundation under Grant No. CNS-1314632, IIS-1408924ARO/DARPA under Contract Number W911NF-11-C-0088Army Research Laboratory under Cooperative Agreement Number W911NF-09-2-005

    Compact distance histogram: a novel structure to boost k-nearest neighbor queries

    Get PDF
    The k-Nearest Neighbor query (k-NNq) is one of the most useful similarity queries. Elaborated k-NNq algorithms depend on an initial radius to prune regions of the search space that cannot contribute to the answer. Therefore, estimating a suitable starting radius is of major importance to accelerate k-NNq execution. This paper presents a new technique to estimate a tight initial radius. Our approach, named CDH-kNN, relies on Compact Distance Histograms (CDHs), which are pivot-based histograms defined as piecewise linear functions. Such structures approximate the distance distribution and are compressed according to a given constraint, which can be a desired number of buckets and/or a maximum allowed error. The covering radius of a k-NNq is estimated based on the relationship between the query element and the CDHs' joint frequencies. The paper presents a complete specification of CDH-kNN, including CDH's construction and radii estimation. Extensive experiments on both real and synthetic datasets highlighted the efficiency of our approach, showing that it was up to 72% faster than existing algorithms, outperforming every competitor in all the setups evaluated. In fact, the experiments showed that our proposal was just 20% slower than the theoretical lower bound.FAPESPCNPqCapesSticAMSU

    Speeding up the combination of multiple descriptors for different boundary conditions

    Get PDF
    Content-based complex data retrieval is becoming increasingly common in many types of applications. The content of these data is represented by intrinsic characteristics, extracted from them which together with a distance function allows similarity queries. Aimed at reducing the “semantic gap”, characterized by the disagreement between the computational representation of the extracted low-level features and how these data are interpreted by the human perception, the use of multiple descriptors has been the subject of several studies. This paper proposes a new method to carry out the combination of multiple descriptors for different boundary conditions in which the balancing is carried out in pairs, starting by the best candidate descriptor. In the experiments, the proposed method achieved computational cost up to 3650 times smaller than the exhaustive search for the best linear combination of descriptors, keeping almost the same average precision, with variations lower than 0.9%.Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq

    Combining diversity queries and visual mining to improve content-based image retrieval systems: the DiVI method

    Get PDF
    This paper proposes a new approach to improve similarity queries with diversity, the Diversity and Visually-Interactive method (DiVI), which employs Visual Data Mining techniques in Content-Based Image Retrieval (CBIR) systems. DiVI empowers the user to understand how the measures of similarity and diversity affect their queries, as well as increases the relevance of CBIR results according to the user judgment. An overview of the image distribution in the database is shown to the user through multidimensional projection. The user interacts with the visual representation changing the projected space or the query parameters, according to his/her needs and previous knowledge. DiVI takes advantage of the users’ activity to transparently reduce the semantic gap faced by CBIR systems. Empirical evaluation show that DiVI increases the precision for querying by content and also increases the applicability and acceptance of similarity with diversity in CBIR systems.FAPESPCNPqCAPESRescuer Project (European Commission Grant 614154 and CNPq/MCTI Grant 490084/2013-3

    Design and evaluation case study: evaluating the kinect device in the task of natural interaction in a visualization system

    Get PDF
    We verify the hypothesis that Microsoft’s Kinect device is tailored for defining more efficient interaction compared to the commodity mouse device in the context of information visualization. For this goal, we used Kinect during interaction design and evaluation considering an application on information visualization (over agrometeorological, cars, and flowers datasets). The devices were tested over a visualization technique based on clouds of points (multidimensional projection) that can be manipulated by rotation, scaling, and translation. The design was carried according to technique Participatory Design (ISO 13407) and the evaluation answered to a vast set of Usability Tests. In the tests, the users reported high satisfaction scores (easiness and preference) but, also, they signed out with low efficiency scores (time and precision). In the specific context of a multidimensional-projection visualization, our conclusion is that, in respect to user acceptance, Kinect is a device adequate for natural interaction; but, for desktop-based production, it still cannot compete with the traditional long-term mouse design.Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq 560104/2010-3)Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP 2011/13724-1)Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (Capes

    Applying texture-spectrum features for lung-CAD purposes

    Get PDF
    In this paper we propose using Texture Spectrum (T-Spec) as the feature extractor method component for Computer-Aided Diagnosis systems, particularly regarding medical lung images. We claim that employing T-Spec for this purpose offers several advantages to represent texture at low computational cost. Moreover, when combined with literature approaches, T-Spec can increase the capability of representing texture for several domains. We tested our approach using two distinct datasets, by classifying their instances considering the T-Spec features along with the RandomForest classifier. Our results showed that T-Spec has achieved good accuracy and requires less computational time and resources. When combined with literature methods, T-Spec has achieved a higher accuracy rate for lung images considering previous approaches.FAPESPCNPqCAPE
    corecore